26 research outputs found

    Planning for Decentralized Control of Multiple Robots Under Uncertainty

    Full text link
    We describe a probabilistic framework for synthesizing control policies for general multi-robot systems, given environment and sensor models and a cost function. Decentralized, partially observable Markov decision processes (Dec-POMDPs) are a general model of decision processes where a team of agents must cooperate to optimize some objective (specified by a shared reward or cost function) in the presence of uncertainty, but where communication limitations mean that the agents cannot share their state, so execution must proceed in a decentralized fashion. While Dec-POMDPs are typically intractable to solve for real-world problems, recent research on the use of macro-actions in Dec-POMDPs has significantly increased the size of problem that can be practically solved as a Dec-POMDP. We describe this general model, and show how, in contrast to most existing methods that are specialized to a particular problem class, it can synthesize control policies that use whatever opportunities for coordination are present in the problem, while balancing off uncertainty in outcomes, sensor information, and information about other agents. We use three variations on a warehouse task to show that a single planner of this type can generate cooperative behavior using task allocation, direct communication, and signaling, as appropriate

    Deep Radial-Basis Value Functions for Continuous Control

    Full text link
    A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the maximum action-value with respect to a deep RBVF can be approximated easily and accurately. Moreover, deep RBVFs can represent any true value function owing to their support for universal function approximation. We extend the standard DQN algorithm to continuous control by endowing the agent with a deep RBVF. We show that the resultant agent, called RBF-DQN, significantly outperforms value-function-only baselines, and is competitive with state-of-the-art actor-critic algorithms.Comment: In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI

    Perceptual Context in Cognitive Hierarchies

    Full text link
    Cognition does not only depend on bottom-up sensor feature abstraction, but also relies on contextual information being passed top-down. Context is higher level information that helps to predict belief states at lower levels. The main contribution of this paper is to provide a formalisation of perceptual context and its integration into a new process model for cognitive hierarchies. Several simple instantiations of a cognitive hierarchy are used to illustrate the role of context. Notably, we demonstrate the use context in a novel approach to visually track the pose of rigid objects with just a 2D camera

    Correction: Konidaris et al. Dating of the Lower Pleistocene Vertebrate Site of Tsiotra Vryssi (Mygdonia Basin, Greece): Biochronology, Magnetostratigraphy, and Cosmogenic Radionuclides. Quaternary 2021, 4, 1

    Get PDF
    Background and scope: The late Villafranchian large mammal age (~2.0–1.2 Ma) of the Early Pleistocene is a crucial interval of time for mammal/hominin migrations and faunal turnovers in western Eurasia. However, an accurate chronological framework for the Balkans and adjacent territories is still missing, preventing pan-European biogeographic correlations and schemes. In this article, we report the first detailed chronological scheme for the late Villafranchian of southeastern Europe through a comprehensive and multidisciplinary dating approach (biochronology, magnetostratigraphy, and cosmogenic radionuclides) of the recently discovered Lower Pleistocene vertebrate site Tsiotra Vryssi (TSR) in the Mygdonia Basin, Greece. Results: The minimum burial ages (1.88 ± 0.16 Ma, 2.10 ± 0.18 Ma, and 1.98 ± 0.18 Ma) provided by the method of cosmogenic radionuclides indicate that the normal magnetic polarity identified below the fossiliferous layer correlates to the Olduvai subchron (1.95–1.78 Ma; C2n). Therefore, an age younger than 1.78 Ma is indicated for the fossiliferous layer, which was deposited during reverse polarity chron C1r. These results are in agreement with the biochronological data, which further point to an upper age limit at ~1.5 Ma. Overall, an age between 1.78 and ~1.5 Ma (i.e., within the first part of the late Villafranchian) is proposed for the TSR fauna. Conclusions: Our results not only provide age constraints for the local mammal faunal succession, thus allowing for a better understanding of faunal changes within the same sedimentary basin, but also contribute to improving correlations on a broader scale, leading to more accurate biogeographic, palaeoecological, and taphonomic interpretations

    A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

    Full text link
    Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.Comment: To appear in Neural Network

    Autonomous Robot Skill Acquisition

    Get PDF
    Among the most impressive of aspects of human intelligence is skill acquisition—the ability to identify important behavioral components, retain them as skills, refine them through practice, and apply them in new task contexts. Skill acquisition underlies both our ability to choose to spend time and effort to specialize at particular tasks, and our ability to collect and exploit previous experience to become able to solve harder and harder problems over time with less and less cognitive effort. Hierarchical reinforcement learning provides a theoretical basis for skill acquisition, including principled methods for learning new skills and deploying them during problem solving. However, existing work focuses largely on small, discrete problems. This dissertation addresses the question of how we scale such methods up to high-dimensional, continuous domains, in order to design robots that are able to acquire skills autonomously. This presents three major challenges; we introduce novel methods addressing each of these challenges. First, how does an agent operating in a continuous environment discover skills? Although the literature contains several methods for skill discovery in discrete environments, it offers none for the general continuous case. We introduce skill chaining, a general skill discovery method for continuous domains. Skill chaining incrementally builds a skill tree that allows an agent to reach a solution state from any of its start states by executing a sequence (or chain) of acquired skills. We empirically demonstrate that skill chaining can improve performance over monolithic policy learning in the Pinball domain, a challenging dynamic and continuous reinforcement learning problem. Second, how do we scale up to high-dimensional state spaces? While learning in relatively small domains is generally feasible, it becomes exponentially harder as the number of state variables grows. We introduce abstraction selection, an efficient algorithm for selecting skill-specific, compact representations from a library of available representations when creating a new skill. Abstraction selection can be combined with skill chaining to solve hard tasks by breaking them up into chains of skills, each defined using an appropriate abstraction. We show that abstraction selection selects an appropriate representation for a new skill using very little sample data, and that this leads to significant performance improvements in the Continuous Playroom, a relatively high-dimensional reinforcement learning problem. Finally, how do we obtain good initial policies? The amount of experience required to learn a reasonable policy from scratch in most interesting domains is unrealistic for robots operating in the real world. We introduce CST, an algorithm for rapidly constructing skill trees (with appropriate abstractions) from sample trajectories obtained via human demonstration, a feedback controller, or a planner. We use CST to construct skill trees from human demonstration in the Pinball domain, and to extract a sequence of low-dimensional skills from demonstration trajectories on a mobile robot. The resulting skills can be reliably reproduced using a small number of example trajectories. Finally, these techniques are applied to build a mobile robot control system for the uBot-5, resulting in a mobile robot that is able to acquire skills autonomously. We demonstrate that this system is able to use skills acquired in one problem to more quickly solve a new problem

    Axial Line Placement in Deformed Urban Grids

    No full text
    The problem of placing axial lines in configurations of convex, non-overlapping polygons originates in the technique of space syntax analysis, which is used in town planning to describe and analyse architectural structures. Unfortunately, the general problem has been found to be NP-Complete, because of the possibility of configurations in which local choices have to be made, which affect the global optimality of the solution. Because of this, previous research has focused either on finding special cases where an exact solution can be obtained in polynomial time, or heuristic algorithms where approximate solutions can be found in polynomial time. Recently

    Intrinsically Motivated Reinforcement Learning: A Promising Framework For Developmental Robot Learning

    Get PDF
    One of the primary challenges of developmental robotics is the question of how to learn and represent increasingly complex behavior in a self-motivated, open-ended way. Barto, Singh, and Chentanez (Barto, Singh, & Chentanez 2004; Singh, Barto, & Chentanez 2004) have recently presented an algorithm for intrinsically motivated reinforcement learning that strives to achieve broad competence in an environment in a tasknonspecific manner by incorporating internal reward to build a hierarchical collection of skills. This paper suggests that with its emphasis on task-general, self-motivated, and hierarchical learning, intrinsically motivated reinforcement learning is an obvious choice for organizing behavior in developmental robotics. We present additional preliminary results from a gridworld abstraction of a robot environment and advocate a layered learning architecture for applying the algorithm on a physically embodied system

    Visual Transfer For Reinforcement Learning Via Wasserstein Domain Confusion

    No full text
    We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted features between a source and target task. WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective. WAPPO outperforms the prior state-of-the-art in visual transfer and successfully transfers policies across Visual Cartpole and both the easy and hard settings of of 16 OpenAI Procgen environments
    corecore